Search CORE

102 research outputs found

LIFT: Learned Invariant Feature Transform

Author: Fua Pascal
Lepetit Vincent
Trulls Eduard
Yi Kwang Moo
Publication venue
Publication date: 29/07/2016
Field of study

We introduce a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individually, we show how to learn to do all three in a unified manner while preserving end-to-end differentiability. We then demonstrate that our Deep pipeline outperforms state-of-the-art methods on a number of benchmark datasets, without the need of retraining.Comment: Accepted to ECCV 2016 (spotlight

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Neural Fourier Filter Bank

Author: Jin Yuhe
Wu Zhijie
Yi Kwang Moo
Publication venue
Publication date: 25/04/2023
Field of study

We present a novel method to provide efficient and highly detailed reconstructions. Inspired by wavelets, we learn a neural field that decompose the signal both spatially and frequency-wise. We follow the recent grid-based paradigm for spatial decomposition, but unlike existing work, encourage specific frequencies to be stored in each grid via Fourier features encodings. We then apply a multi-layer perceptron with sine activations, taking these Fourier encoded features in at appropriate layers so that higher-frequency components are accumulated on top of lower-frequency components sequentially, which we sum up to form the final output. We demonstrate that our method outperforms the state of the art regarding model compactness and convergence speed on multiple tasks: 2D image fitting, 3D shape reconstruction, and neural radiance fields. Our code is available at https://github.com/ubc-vision/NFFB

arXiv.org e-Print Archive

Learning to Find Good Correspondences

Author: Fua Pascal
Lepetit Vincent
Ono Yuki
Salzmann Mathieu
Trulls Eduard
Yi Kwang Moo
Publication venue
Publication date: 25/03/2018
Field of study

We develop a deep architecture to learn to find good correspondences for wide-baseline stereo. Given a set of putative sparse matches and the camera intrinsics, we train our network in an end-to-end fashion to label the correspondences as inliers or outliers, while simultaneously using them to recover the relative pose, as encoded by the essential matrix. Our architecture is based on a multi-layer perceptron operating on pixel coordinates rather than directly on the image, and is thus simple and small. We introduce a novel normalization technique, called Context Normalization, which allows us to process each data point separately while imbuing it with global information, and also makes the network invariant to the order of the correspondences. Our experiments on multiple challenging datasets demonstrate that our method is able to drastically improve the state of the art with little training data.Comment: CVPR 2018 (Oral

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

LF-Net: Learning Local Features from Images

Author: Fua Pascal
Ono Yuki
Trulls Eduard
Yi Kwang Moo
Publication venue
Publication date: 22/11/2018
Field of study

We present a novel deep architecture and a training strategy to learn a local feature pipeline from scratch, using collections of images without the need for human supervision. To do so we exploit depth and relative camera pose cues to create a virtual target that the network should achieve on one image, provided the outputs of the network for the other image. While this process is inherently non-differentiable, we show that we can optimize the network in a two-branch setup by confining it to one branch, while preserving differentiability in the other. We train our method on both indoor and outdoor datasets, with depth data from 3D sensors for the former, and depth estimates from an off-the-shelf Structure-from-Motion solution for the latter. Our models outperform the state of the art on sparse feature matching on both datasets, while running at 60+ fps for QVGA images.Comment: NIPS 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

TILDE: A Temporally Invariant Learned DEtector

Author: Fua Pascal
Lepetit Vincent
Verdie Yannick
Yi Kwang Moo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/03/2015
Field of study

We introduce a learning-based approach to detect repeatable keypoints under drastic imaging changes of weather and lighting conditions to which state-of-the-art keypoint detectors are surprisingly sensitive. We first identify good keypoint candidates in multiple training images taken from the same viewpoint. We then train a regressor to predict a score map whose maxima are those points so that they can be found by simple non-maximum suppression. As there are no standard datasets to test the influence of these kinds of changes, we created our own, which we will make publicly available. We will show that our method significantly outperforms the state-of-the-art methods in such challenging conditions, while still achieving state-of-the-art performance on the untrained standard Oxford dataset

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Layered Controllable Video Generation

Author: Huang Jiahui
Jin Yuhe
Sigal Leonid
Yi Kwang Moo
Publication venue
Publication date: 21/07/2022
Field of study

We introduce layered controllable video generation, where we, without any supervision, decompose the initial frame of a video into foreground and background layers, with which the user can control the video generation process by simply manipulating the foreground mask. The key challenges are the unsupervised foreground-background separation, which is ambiguous, and ability to anticipate user manipulations with access to only raw video sequences. We address these challenges by proposing a two-stage learning procedure. In the first stage, with the rich set of losses and dynamic foreground size prior, we learn how to separate the frame into foreground and background layers and, conditioned on these layers, how to generate the next frame using VQ-VAE generator. In the second stage, we fine-tune this network to anticipate edits to the mask, by fitting (parameterized) control to the mask from future frame. We demonstrate the effectiveness of this learning and the more granular control mechanism, while illustrating state-of-the-art performance on two benchmark datasets. We provide a video abstract as well as some video results on https://gabriel-huang.github.io/layered_controllable_video_generationComment: This paper has been accepted to ECCV 2022 as an Oral pape

arXiv.org e-Print Archive